BigDFT.BioQM module

A module to define typical operations that can be done on biological systems

load(archive, serialization_version=None, options={})[source]

Create a Biosystem from a serialization archive. It extracts the archive files in a temporary directory and loads the related information. Also it deletes the directory once the load is finished.

Parameters
  • archive (path) – the path of the serialization archive

  • serialization_version (str) – version of the archive. Latest if not specified.

Returns

Instance of the class, loaded from the archive

Return type

BioSystem

serialize_biosystem(archive, posinp, logfile, version=None, **kwargs)[source]

Serialize the biosystem associated to the provided files

Parameters
  • archive (path) – name of the archive to be employed for dumping

  • posinp (path) – pdbfile of the structure

  • logfile (path) – logfile of the structure

  • version (str) – version of the serialization

  • kwargs (**) – Arguments to be passed to BioSystem instantiation

class BioSystemSerialization(biosys, version=None)[source]

Serialization of the BioSystem class that would prevent the need to copy the entire matrices of the QM calculations

Parameters
  • biosys (BioSystem) – a instance of the Biosystem class

  • version (str) – the version of the desired serialization. If absent, the default is considered.

version = '1.1'

version of the serialization

cached_attributes = {'1.0': {'cached_refragment': '_cached_refragment', 'fragindices': 'frag_indices', 'pairwise_BO': 'bond_orders', 'purities': 'purities'}, '1.1': {'interactions': 'interactions'}, '1.2': {'fragment_quantities': '_fragment_quantities'}, '2.0': {'atomic_BO': 'atomic_bond_orders', 'atomic_interactions': 'atomic_interactions', 'atomic_purities': '_atomic_purities', 'cached_refragment': '_cached_refragment'}}
dump(archive)[source]

Create an archive with the entire set of information of the Serialization. Such a tarfile should be such that the same analysis of the BioSystem is possible

class Sequence(seq)[source]

Initialize and define a sequence, may be amminoacidic or nitrogenous.

Parameters

seq (Bio.Seq.Sequence) – the starting sequence

display(colors=None, labels=None, axs=None, boxcount=None, remove_letters=[])[source]

Display the sequence in a easy-to-read form

Parameters
  • boxcount (int) – the number of boxes per row

  • remove_letters (list) – list of letter positions that will be removed

pdb_sequences(filename)[source]

Sequences of the chains of the proteins, in the order of the residue_list

Parameters

filename (str) – protein file, in pdb format

Returns

strings of the protein sequence in the FASTA sequence order

Return type

list

read_structure(filename, name='protein')[source]

Extract a biological structure from a PDB file. Use the package Bio.PDB

Parameters
  • filename (str) – path to the file of the structure

  • name (str) – the name of the structure

Returns

(Bio.PDB.Structure) A structure instance

residue_list(structure, system, strict=True)[source]

Identify the residues of the sequence in the fragment of the system

Parameters
  • structure (structure) – A structure read from the PDB parser of Bio.PDB

  • system (BigDFT.Systems.System) – a system coming from BigDFT package

  • strict (bool) – define if the identification has to be performend via centroids control or just by the residue name

Returns

tuple of list the fragments of the system in the order of the

residues of the structure. Some of the fragments may be missing as they may have been merged of not associated to elements of the sequence. The second list is the list of the ordered residues.

Return type

tuple

sequences_to_residues(reslist, chains)[source]

Identify for each element of the chains which are the residues that may be associated to them.

Parameters
  • reslist (list) – the list of all the residues of the system

  • chains (list) – a list containing the sequences of the system

Returns

a list of length of the chains indicating the lookup array for the

sequence

Return type

list

class Graph(fragkeys, bond_orders, frag_labels, cutoff=0.01, restrict_to=None)[source]

A Graph representation of a System or a subportion of it.

Identify the connectivity of the system given a threshold, possibly restricted to a subset of fragments :param cutoff: the threshold of the cumulative fragment bond order

that is applied to define the connectivity

Parameters

restrict_to (list) – list of the fragment id to which calculate the connectivity

display(colors=None, axs=None, edge_labels=None, node_shapes=None, edge_colors=None, node_edge_colors=None, colorbar_mappable=None, colorbar_label='', **kwargs)[source]

Draw the graph in the Kamada Kawai representation. Employ colors if present.

Parameters
  • colors (list) – the list of the colors that have to be employed for all the graph nodes. The list has the size of the full keys list. Only target nodes are employed.

  • colorbar_mappable (matplotlib.colors.ScalarMappable) – mappable to be employed to draw a colorbar associated to the node colors.

  • colorbar_label (str) – label to be associated to the colorbar

  • edge_colors (list) – the list of the colors that have to be employed for each of the edges of the node

  • axs (matplotlib.axis) – the axis to draw on (optional)

  • edge_labels (dict) – a dictionary from tuples of node labels to a given label.

  • node_shapes (list) – list indicating the shapes of the nodes, in order of full keys. Only target nodes are used for the shape.

  • node_edge_colors (list) – the list of the colors that have to be employed for all the graph node borders. Same specs than the colors argument.

  • kwargs (dict) – any other argument you wish to pass to networkx. Those arguments are arranged in keys, among which common, nodes, edges, labels, edge_labels. Each values of the keys correspond to the arguments which should be passed to the corresponding draw_networkx_<> method.

construct_frag_tuple(frag)[source]

Build the fragment tuple given the fragment name.

Parameters

frag (str) – the fragment name

Returns

the (chain,fragname,fragno) tuple name

Return type

tuple

construct_res_tuple(res)[source]

Build the BigDFT fragment tuple given the residue of the structure

Parameters

res (Residue) – A residue Class on the Biopython package

Returns

the (chain,fragname,fragno) tuple

Return type

tuple

residue_name(res)[source]

Name of the residue in the fragment specification of PyBigDFT

Parameters

res (Residue) – A residue Class on the Biopython package

Returns

name of the fragment

Return type

str

biosystem_fragmentation(sys, structure)[source]

Refragment the system names according to the residues name of the structure from the biopython module.

Parameters
  • sys (System) – and instance of the BioSystem class, fragmented in a

  • way (traditional) –

  • structure (Bio.Structure) – a structure coming from the BioPython module

Returns

a system with a fragmentation that is dependent of the

chain of the molecules

Return type

System

sequences_to_fragments(chains_to_residues, frag_to_residues)[source]

Defines the lookup array of the sequences to the fragments

Parameters
  • chains_to_residues (list) – lookup of the chains to the list of recogized residues

  • frag_to_residues (list) – name of the associated fragments in the order of residues list

Returns

array of the name of the fragments in the same representation of

chain_to_residues

Return type

list

class BioSystem(filename, sys=None, structure=None, sequence_from_fragments=False, disable_warnings=False, **kwargs)[source]

Initialize a System for a Biological analysis. Retrieve, from the initial file, the basic fragments, the system sequence, their mutual interactions and the informations about the purity and the observables

Parameters
  • filename (str) – a pdb file with fragment information inside

  • sys (System) –

    an initial system, useful for a custom refragmentation.

    If present, overrides the association written in the pdbfile. It is user’s responsibility to ensure coherency of the total number of atoms.

    sequence_from_fragments (bool): method to determine the sequences.

    If True, the sequence are determined by the fragment names. It is user’s responsibility to provide a inputfile that has a set of residues compatible with the fragmentation.

    disable_warnings (bool): disable the BioPython warnings if true structure (Bio.Structure): a system structure that is meant to replace

    the one genrated by the file parser. Useful for corrupted pdb files. To be used with sequence from_fragments

    **kwargs: other arguments to be passed to py:func:~BigDFT.IO.read_pdb

classmethod from_sys(system, attributes={}, **kwargs)[source]

Create a BioSystem instance from a previously existing system.

Parameters
  • system (System) – the system instance.

  • attributes (dict) – attributes of the Biosystem that will be set from the user.

  • **kwargs – other arguments that have to be passed to the BioSystem instantiation.

Returns

a BioSystem instance

Return type

BioSystem

classmethod from_logfile(logfile, **kwargs)[source]

Create a BioSystem instance from the information in a logfile.

Parameters
  • logfile (str) – path of the logfile.

  • **kwargs – other arguments that have to be passed to the BioSystem instantiation.

Returns

a System population instance

Return type

BioSystem

set_archive_name(archive)[source]

Original archive name.

Set the name of the original archive from which this biosystem comes from.

Parameters

archive (str) – archive name

set_qm_run(log)[source]

Associate the system to a QM run that has been performed by BigDFT. Such run assumes that the positions of the system atoms are consistent with the PDB

Parameters

log (Logfiles.Logfile) – A Logfile of the BigDFT run

refragment(cutoff=0.05, view=None, groups=[])[source]

Identify the system’s fragmentation and represent it in terms of the systems’ residues. Employs the user defined cutoff to find the correct fragmentation

Parameters
  • cutoff (float) – cutoff employed for the refragmentation.

  • view (dict) – the view to be imposed as a starting fragmentation in the system.

  • groups (list) – list of groups of the fragments view that will be employed as an initial guess for the fragmentation.

chessboards(cutoffs, initial_view=None, initial_groups=[])[source]

Define a coherent group of fragmentations that are provided from the loosest to tightest cutoffs. The views of the fragmentations are defined such as the tighest cutoffs always contain regrouping of fragments from the loosest

Parameters
  • cutoffs (list) – list of floating point numbers of the desired cutoffs values.

  • initial_view (dict) – mapping of the initial fragmentation that will be imposed starting from the loosest cutoff.

  • initial_groups (list) – groups of the initial view which will be first considered for fragmentation.

Returns

dictionary of the view. the values of the dictionary are

length-1 lists such that subsequent views (for instance found when instantiating a population) can be appended

Return type

dict

property fragment_purities

Define the quality of the fragmentation that is provided in the system

set_fragment_quantities(name, data)[source]

Set fragment quantities to the system.

Include in the biosystem a set of fragment quantities that are associated to the fragments. Such quantities can then be retrieved by the fragment_values method.

Parameters
  • name (str) – the name of the quantities to invoke

  • data (list, dict) –

    if a list, the quantities in order of fragment_names, otherwise a dictionary containing the

    per-fragment information

fragment_distances(target_fragments)[source]

Calculate the distance between each of the fragment and a list of target fragments.

Parameters

target_fragments (list) – Id of the fragments to focus on

Returns

list of the distances in the order of fragment_names

Return type

array-like

fragment_interaction_and_feedback(target, criteria, environment=None, view=None)[source]

Provides interaction with a target and its feedback.

This method calculates the fragment interactions between a target and completes the result with the feedback of such interaction that an environmental region provides on such target. :param target: Id of the fragments to focus on :type target: list :param criteria: may be ‘hamiltonian’, ‘electrostatic’,

or ‘bond_order’

Parameters
  • environment (list) – environmental region, separated from the target. If absent, the complementary region of the target is considered.

  • view (dict) – fragmentation view of the system. If present, target should be defined in the view.

Returns

list of the strengths of the interactions of the

fragments wrt to the target fragments. The order of the list is provided in terms of the fragment_names. The values of target fragments are associated to the feedback interaction with the environment

Return type

array-like

identify_contact_region(target, cutoff_bo=0.001, cutoff_el=10000000000.0, view=None, environment=None)[source]

Identify the contact with a target.

Single out the portion of the target fragments that interacts with the rest of the system.

Parameters
  • target (list) – fragments of the target region.

  • cutoff_bo (float) – the value of the Fragment Bond Order above which the fragments are considered to be interacting. Typical values range between 0.01 and 1.e-3.

  • cutoff_el (float) – cutoff for the electrostatic value (Ha). Fragments whose interaction is larger than such value are also included in the contact region.

  • view (dict) – the mapping of the superunits into fragments.

  • environment (list) – environmental region under which restrict the sum.

Returns

the list of the contact region as well as the value of the

bond order

Return type

tuple

calculate_contact_regions(target, cutoff_bo=0.001, cutoff_el=10000000000.0, name='', view=None, environment=None)[source]

Identify the contact with a target.

Single out the portion of the target fragments that interacts with the rest of the system, and store it.

Parameters
  • target (list) – fragments of the target region.

  • cutoff_bo (float) – the value of the Fragment Bond Order above which the fragments are considered to be interacting. Typical values range between 0.01 and 1.e-3.

  • cutoff_el (float) – cutoff for the electrostatic value (kcal/mol). Fragments whose interaction is larger than such value are also included in the contact region.

  • name (str) – Name of the region to be stored in the fragment_values.

  • view (dict) – mapping of the superunits into fragments.

  • environment (list) – the contact region is only calculated taking into accout those fragments.

extract_target_contact_and_counter_region(name='', environment=None)[source]

Extract the contact region in the target and counter-region.

Parameters
  • name (str) – name of the stored region to extract

  • environment (list) – fragment belonging to the environmental region. The entire complementary region will be considered if absent.

calculate_target_interactions(target, name='', include_dispersion=False, nthreads=1, environment=None, view=None)[source]

Calculate the interaction terms with a target.

This function stores the fragment quantities associated to the interaction term with a target.

Parameters
  • target (list) – list of the fragments of the target region

  • name (str) – Name of the region to be stored in the fragment values.

  • include_dispersion (bool) – if true, calculate also the dispersion interaction of the target with the environment. The dispersion is only calculated in the contact fragments. Therefore if this value it true it is assumeed that the contact regions have already been calculated and stored.

  • nthreads (int) – number of threads for the parallel evaluation of the dispersion

  • environment (list) – fragments which correspond to the environment region to be considered for the target. If absent, the complementary region will be considered.

  • view (dict) – mapping of the superunits into fragments

dispersion_interactions(region1, region2, nthreads=1, view=None)[source]

Van der Waals interaction strengths.

Extract the dictionary of the dispersion interactions Between two disjoint regions.

Parameters
  • region1 (list) – list of fragments of region 1

  • region2 (list) – list of fragments of region 2

  • nthreads (int) – number of omp threads to perform the calculation.

  • view (dict) – mapping of the superunits into fragments.

fragment_interaction_strengths(target_fragments, criteria='hamiltonian', view=None)[source]

Interactions between the different fragments of the system, possibly specified on target regions.

Parameters
  • target_fragments (list) – Id of the fragments to focus on. If a view is provided it should be expressed in terms of the fragments of the view.

  • criteria (str) – may be ‘hamiltonian’, ‘electrostatic’, or ‘bond_order’.

  • view (dict) – view of the fragmentation to be employed to evaluate the strength.

Returns

list of the strengths of the interactions of the

fragments wrt to the target fragments. If a view is provided the data of each fragment is replicated for each of the superunits belonging to the fragment.

Return type

array-like

d3PBE_energy(subsystem=None, erase_d3_files=True)[source]

Full dispersion energy of the system, or of a subsystem.

Parameters
  • subsystem (list) – fragment to focus on. If absent, the entire system is considered.

  • erase_d3_files (bool) – If true remove the d3 files once the calculation is completed.

Returns

the d3 PBE energy for the PBE functional, in Hartree.

Return type

float

system_target_interaction_investigation(target, nthreads, environment=None, **kwargs)[source]

Identify the different interactions of the system with a target. :param target: the list of the fragments defining the target. :type target: list :param nthreads: number of threads to calculate the vdW correction. :type nthreads: int :param environment: fragments which will be considered in the

environmental region. If absent the complementary region of the target will be considered.

Parameters

**kwargs – arguments of calculate_contact_regions

display_sequences(axs=None, labeldict=None, boxcount=None, with_fragment_labels=False, remove_letters=None, **kwargs)[source]

Display the full sequencies of the system. Employ the given field to define the color dict if available

Parameters
  • field_vals (list) – values of the field to decide the colors of the keys

  • labeldict (dict) – an optional dictionary mapping fragment ids to strings that will be printed on each sequence square.

  • colordict (dict) – the dictionary of the keys, and the corresponding colors. Overrides field_vals if present

  • axs (list) – list of the axis in which to plot the sequences

  • with_fragment_labels (bool) – a boolean useful to represent the fragment numbers. Can be combined with a view to visualize the QM fragments. Provides a default labeldict.

  • remove_letters (list) – list of the letters to be removed from the display. In elements of the chains.

  • **kwargs – arguments to be passed to display of the colordict

Returns

list of the axis of the sequences

Return type

list

display_graph(restrict_to=None, bo_cutoff=0.01, fragment_labels=None, fragment_shapes=None, ax=None, colorbar_label='', edge_labels=None, bond_orders=None, edge_colordict_kw=None, node_edge_colordict_kw=None, display_kwargs={}, **kwargs)[source]

Display a graph view of the system.

With this routine the fragments of the system are represented as a Graph, connected by the Fragment Bond Order criteria. Graph nodes can be represented as with a color map that is provided in a similar way than the sequence display.

Parameters
  • restrict_to (list) – list of fragments to which the graph has to be restricted.

  • bo_cutoff (float) – the limit for a non-negligible chemical link

  • fragment_labels (list, dict) – list (in fragment_names order) of the labels of the fragments or dictionary of the fragments which need relabelling.

  • fragment_shapes (dict) – per-fragment dictionary of the shapes to be employed in the graph drawing. Only non-default values are indicated. If absent, the nodes which belong to the restrict_to list will be represented as squares, and the others as circles.

  • edge_labels (dict) – dictionary of dictionaries: labels to be written next to the edges, with keys indicating the fragment names.

  • ax (matplotlib.pyplot.axes) – matplotlib axis

  • colorbar_label (str) – label to be associated to the colorbar. If None, no colorbar is drawn.

  • bond_orders (dict-like) – dictionary of the bond orders. Supersedes the systems’ bond order if present. Not compatible with a view.

  • edge_colordict_kw (dict) – dictionary of the keyword arguments of the edge colors. The same syntax of the colordict method is allowed, written in terms of the tuple of the fragments. An additional keyword argument, colordict_special, is also allowed. This one overrides the arguments provided by the colordict for the fragment tuples provided.

  • node_edge_colordict_kw (dict) – dictionary of the keyword arguments of the node border colors. The same syntax of the colordict method is allowed.

  • display_kwargs (dict) – dictionary of the keyword arguments to be passed to Graph.display method.

  • **kwargs – colordict arguments.

Returns

the axes of the displayed graph

Return type

matplotlib.pyplot.axes

refragmentation_colordict(refrag_keys, **kwargs)[source]

Define a colordict that associates the fragment which go together with the same color and put to white all the previously pure fragments.

Parameters
  • refrag_keys (list) – Keys of the refragmented system

  • **kwargs – keyword arguments of the initial colordict

Returns

the dictionary of the colors (equal color means same fragment)

Return type

dict

colordict(color_by=None, keys=None, field_vals=None, colordict=None, colorcode=None, highlight=None, view=None, vmin=None, vmax=None, highlight_color='red')[source]

Define the colordict to quantify the fragments according to the method.

Parameters
  • keys (list) – keys of the fragment names.

  • color_by (str) – can be ‘charge’, ‘dipole’, ‘purity’.

  • field_vals (list) – values of the field to decide the colors of the keys.

  • colordict (dict) – the dictionary of the keys, and the corresponding colors. Overrides field_vals if present

  • colorcode (str) –

    the string of the colorcode.

    It represents the colormap of matplotlib. Default is ‘seismic’ for diverging values (i.e. field_dict has negative data), otherwise ‘Reds’.

    highlight (list)color in red only the fragments which

    are indicated in the list.

    view (dict): the fragmentation of the system to aggregate

    the field_vals to.

    highlight_color (matplotlib.colors): color to be employed

    for highlighting.

labeldict(labels={}, shift={}, mappable=None, fmt=['num'], view=None, joinchar='+')[source]

Label the fragments with a recognizable string.

This function returns a dictionary to label the fragments of the system in a way that can be employed in plots or graphs.

Parameters
  • shift (dict) – dictionary of the shifts to be applied to each chain from the first residue ID, with keys corresponding to the ID of the chain to be shifted. Useful for sequence alignment.

  • mappable (func) – function that returns a string from a system’s fragment name.

  • labels (dict) – dictionary of explicit lables to be given to each fragment. Overrides all the other options for the indicated fragments.

  • view (dict) – a System fragments’ view. If present, fragment names are concatenated unless specified otherwise in mappable and label.

  • joinchar (str) – character to be used to join labels in case of a view.

  • fmt (list) – list indicating the characters that concatenate the residue attributes. The list should indicate “ch”, “res”, and “num” as the chain, residue name and number, respectively. “letter” can also be provided in case this is available.

Returns

A dictionary of the considered fragments with

the corresponding names as strings.

Return type

dict

fragment_values(criteria=None, view=None)[source]

Retrieve the fragment values that are associated to a given criteria. This is a commodity function made to make more intentional some data.

Parameters
  • criteria (str) – can can be ‘charge’, ‘dipole’, ‘purity’

  • view (dict) – the dictionary of the fragment view. If present, aggregate the values accordingly.

Returns

in order of the fragment names.

Return type

list

fragment_scatterplot(errors=None, ax=None, **kwargs)[source]

Defines the scatterplot to have a look to in order to interpret sequence data

Parameters
  • errors (list) – the errors of the field_vals in the order of fragment_names

  • ax (matplotlib.pyplot.Axes) – Axis of the matplotlib instance

  • **kwargs – the same arguments of the :py:func:colordict method

Returns

reference to the axis employed for the plot

Return type

matplotlib.pyplot.Axes

represent(errors=None, with3d=False, with_fragment_labels=False, boxcount=None, remove_letters=None, labeldict=None, **kwargs)[source]

Represent all the data of the systems, both in with the sequence and with the scatterplot. Useful only when there is data to be represented.

Parameters
  • with3d (bool) – if True represent the 3d vision of the system

  • errors (list) – the errors of the field_vals in the order of fragment_names

  • **kwargs – the same arguments of the py:meth:colordict method

Returns

list fo the axis of the plot

Return type

list

unmatched_frag_info()[source]

Retrieve the information about the fragments which have not been matched inside the sequence

Returns

a Dictionary of the fragment that have not been recognized

Return type

dict

charge_at_pH(ph, closed_shell=True)[source]

Identify the charge that the system should have for a given pH. Employ the ~py:meth:Bio.Bio.SeqUtils.ProtParam.ProteinAnalysis.charge_at_ph method of BioPython

Parameters
  • ph (float) – the value of the ph

  • closed_shell (bool) – defines if the charge should be rounded to the nearest even integer considering the total number of electrons of the system.

Returns

the value of the charge

Return type

float

charge_by_protonation()[source]

Identify the system’s charge by looking at the protonation state.

The protonation state of each amminoacid is only identified by the total number of atoms of the residue. It is user’s responsibility to check that the atoms are placed correctly.

Returns

charge of the system, list of residues for which the

charge cannot be determined.

Return type

tuple

override_fragment_association(fragment_names, new_names=None)[source]

Force the association of the fragment provided by the fragment_names list into the data represented by the new_names. New names should be provided as ‘Ch-RES:num’ where Ch is the chain, RES is the aminoacid residue at the position num.

Parameters
  • fragments_names (list) – list of the fragments to be reassociated. Should be present in the unmatched_fragment list.

  • new_names (list) – list of the new names to be produced, in fragment_names element order. If absent, the fragment names are employed.

graph_bond(fraglist, threshold, pairwise_bo, restrict_to=None)[source]

Defines the connectivity matrix of the fragments

Parameters
  • fraglist (list) – the list of the fragment names

  • threshold (float) – the threshold of the cumulative fragment bond order that is applied to define the connectivity

  • pairwise_bo (dict) – list of the fragment BO connections

  • restrict_to (list) – list of fragments to which the graph has to be restricted

Returns

connectivity matrix of the graph

Return type

(matrix)

get_BO_network(connectivity_matrix)[source]

Defines the connectivity matrix of the fragments :param connectivity_matrix: matrix of the connnectivities of

the system

Parameters

pairwise_bo (dict) – list of the fragment BO connections

Returns

graph,target_nodes,target_edges.

Graph to be plotted and the list of target nodes and edges identified by the restriction if present.

Return type

tuple

fragment_letters(fragnames, sequences, residue_list, sequences_to_residues)[source]

Give a list of the fragments in terms of their letter

Parameters
  • fragnames (list) – list of the names of the fragments

  • residue_list (list) – The fragments of the system in the order of the residues of the structure

  • residues (sequences_to) – strings of the protein sequence in the FASTA sequence order

  • sequences (list) – Sequences of the system

Returns

the labels of the fragments as letters. the label is set to ‘X’

if the letter is not recognized

Return type

list

residue_letter(ires, sequences, sequences_to_residues)[source]

Find the letter associated with the residue

Parameters
  • ires (int) – id of the residue

  • sequences (list) – Sequences of the system

  • residues (sequences_to) – strings of the protein sequence in the FASTA sequence order

Returns

letter of the residue

Return type

str

interaction_strengths(fragments, target_fragments, pairwise_bo)[source]

Define the interaction strengths of the fragment list with the others with respect to a set of target fragments.

Parameters
  • fragments (list) – the name of the fragments of the system in pairwise_bo.

  • target_fragments (list) – the fragments constituting the target region.

  • pairwise_bo (dict) – the bond order between fragments.

Returns

the sum of the weigths of the interactions of the fragments

in the target region.

Return type

array

find_relevant_fragments(sys, data, criteria, columns=None)[source]

Identify the fragments of the system that fulfill the criteria provided by the function in argument

Parameters
  • sys (BigDFT.System) – the System to identify

  • data (array, Dataframe, dict) – data to take care of. If ana array, it is considered in order of fragment_names. If it is a dataframe or a dict, the name of fragments are the keys.

  • criteria (func) – function that can be applied to the data items. The function is applied to each of the data elements and the criteria is assumed to be satisfied if any of the element fulfill the criteria

  • columns (list) – the fragment to focus the search to. Useful in the case of a dictionary

Returns

fragments that fulfill the criteria

Return type

list

class BioSystemPopulation(systems, representative, **kwargs)[source]

An ensemble of BioSystem. Useful to provide and analyze averaged quantities

Parameters
  • systems (dict, list) –

    BioSystems defining the populations. They can be provided as a dict of pdb files plus logfiles, or as a dictionary of systems, or as a dictionary of archives for serialization. The dictionary is composed as follows: ‘label’: {‘archive’: <archive_filename>,

    ’system’: <BioSystem instance>, ‘filename’: <pdbfile>, ‘logfile’: <logfile_path>, ‘weight’: weigth of the sample in the population, ‘mapping’: provides the expression of the fragment of

    the representative in term of the fragments of the item of the population}

    alternatively, a list can be employed, in which case the filling of the population is performed sequentially

  • representative (str, int) – label of the system representative of the population

  • exclude_representative (bool) – if True, the data of the population will be evaluated without considering the representative

  • fragment_values (list) – list of the internal fragment values that can be found inside each of the system. Can be left as an empty list to delegate all the evaluation to the populations.

  • fragment_interactions (list) – list of the attributes which correspond to the quantities which are associated to inter-fragment interactions.

  • to_evaluate (dict) – dictionary of functions to be employed to evaluate at the creation of a population. They must provide a quantity that is related to a system.

  • chessboard_cutoffs (list) – list of the cutoffs that will have to be employed for the fragmentation. Use an empty list to deactivate.

  • initial_view (dict) – view of the superunits fragmentation that will be employed to the representative in the lowest cutoff

  • **kwargs – arguments to be passed at the loading of the system

serialize(archive, chessboard_dict=None, **kwargs)[source]

Dump the entire population into an archive, containing the representative archive, if it is provided as such (otherwise the serialization of the system) as well as the numpy files of all the population data.

Parameters
  • archive (str) – archive to dump the population to

  • chessboard_dict (dict) – optional dictionary of the fragmentation of the population that has been previously found with a chessboard algorithm. If absent, the internal chessboard is employed

  • **kwargs – arguments to be passed to BioSystemSerialization instantiation in the case of a system.

classmethod load(archive, **kwargs)[source]

Create a BioSystemPopulation instance from a serialized archive.

Parameters
  • archive (str) – the path of the archive

  • **kwargs – other arguments that have to be passed to the BioSystemPopulation instantiation

Returns

a biosystem population instance

Return type

BioSystemPopulation

update_fragmentation(cutoff, view_old, sys)[source]

Update the fragmentation views starting from the last element of a list of views

Parameters
  • cutoff (float) – the cutoff to employ for the fragmentation

  • view_old (dict) – previous view of the system

  • sys (BioSystem) – the system employed to update the frament list

Returns

new view of the system

Return type

dict

reorder(array, of, with_respect_to, mapping=None)[source]

Reorder an array of data of systems in such a way that it can be applied to characterize another one.

Parameters
  • array (array-like, dict) – data in the order of of.fragment_names. in case of a dictionary, it is assumed that it contains the data coming from interactions or bond orders.

  • of (BioSystem) – the system associated to the original data

  • with_respect_to (BioSystem) – the system to which reorder the data

  • mapping (dict) – dictionary of the view that decomposes the fragments of with_respect_to in terms of the fragment of of in case of multiple elements in the values, an aggregation is considered. If mapping is present, there should not be undefined fragments in their values

Returns

data in the new order. It contains numpy.nan for the

fragments which are not present in the new system.

Return type

numpy.array

clean_view(view_orig, superunits)[source]

Define a view of the system that comes from a view of another one. Superunits are cleaned in such a way that no inconsistencies are present in the new view.

Parameters
  • view_orig (dict) – the original view we would like to clean

  • superunits (list) – list of the new system’s superunits

Returns

the new view of the system, where combined superunits are

joined with the ‘+’ symbol

Return type

dict

rename_residue(res)[source]

Rename the residue of the system such that alphabetical order corresponds to sequence order

Parameters

res (str) – residue of the system, in form “chain-resname:pos”

Returns

residue in the form “chain-pos:letter”

Return type

str

Warning

Only put letters to residues which have letter associated

rename_labels(labels, as_dict=False, from_dict=False, mappable=<function rename_residue>)[source]

Rename the names of the residues of a BioSystem in order to be compatible with other naming schemes.

Parameters
  • labels (iterable) – contains the list of residue to be renamed

  • as_dict (bool) – if True, returns a dictionary as a relabel

  • from_dict (bool) – if True assumes that the remapping has to be performed also on the labels values.

  • mappable (func) – the function that will be applied for renaming

Returns

dictionary or list of the renamed labels, depending on

the arguments.

Return type

dict, list